Occurrence and Substring Heuristics for i-Matching

نویسندگان

  • Maxime Crochemore
  • Costas S. Iliopoulos
  • Thierry Lecroq
چکیده

We consider a version of pattern matching useful in processing large musical data: matching, which consists in finding matches which are -approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols , is measured as . We also consider -matching, where is a bound on the total sum of the differences. We first consider “occurrence heuristics” by adapting exact string matching algorithms to the two notions of approximate string matching. The resulting algorithms are efficient in practice. Then we consider “substring heuristics”. We present -matching algorithms fast on the average providing that the pattern is “non-flat” and the alphabet interval is large. The pattern is “flat” if its structure does not vary substantially. The algorithms, named BM1, -BM2 and -BM3 can be thought as members of the generalized Boyer-Moore family of algorithms. The algorithms are fast on average. This is the first paper on the subject, previously only “occurrence heuristics” have been considered. Our substring heuristics are much stronger and refer to larger parts of texts (not only to single positions). We use -versions of suffix tries and subword The work of these authors was partially supported by NATO grant PST.CLG.977017. The work of this author was partially supported by Welcome foundation, Royal Society and EPSRC grants. 2 Crochemore, Iliopoulos, Lecroq, Pinzon, Plandowski and Rytter / Occurrence and Substring Heuristics for -Matching graphs. Surprisingly, in the context of -matching subword graphs appear to be superior compared with compact suffix trees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Occurrence and Substring Heuristics for -Matching

We consider a version of pattern matching useful in processing large musical data: matching, which consists in finding matches which are -approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols , is measured as . We also consider -matching, where is a bound on the total sum of the diff...

متن کامل

kmacs: the k-Mismatch Avera- ge Common Substring Approach for Phylogeny Reconstruction

The vast majority of sequence comparison methods for phylogeny reconstruction rely on pairwise or multiple sequence alignments. These approaches are in practice not usable for longer sequences such as complete genomes. For this reason alignment-free methods have recently become more popular because they are much faster and usually computable in linear time. Some of these methods are based on re...

متن کامل

Space-Time Trade-Offs for the Shortest Unique Substring Problem

Given a string X[1, n] and a position k ∈ [1, n], the Shortest Unique Substring of X covering k, denoted by Sk, is a substring X[i, j] of X which satisfies the following conditions: (i) i ≤ k ≤ j, (ii) i is the only position where there is an occurrence of X[i, j], and (iii) j − i is minimized. The best-known algorithm [Hon et al., ISAAC 2015] can find Sk for all k ∈ [1, n] in time O(n) using t...

متن کامل

Abelian pattern matching in strings

Abelian pattern matching is a new class of pattern matching problems. In abelian patterns, the order of the characters in the substrings does not matter, e.g. the strings abbc and babc represent the same abelian pattern a+2b+c. Therefore, unlike classical pattern matching, we do not look for an exact (ordered) occurrence of a substring, rather the aim here is to find any permutation of a given ...

متن کامل

Compact Recognizers of Episode Sequences

Mikhail J. Atallah t Purdue University Given two strings T = at ... an and P = hI .. .hm over an alphabet E, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple D(nlog lEI) time preprocessing ofT makes it easy to decide subsequently for any P and in at most IPJIog lEI character comparisons, whether P is a subsequence o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Fundam. Inform.

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2003